Out-of-Order Memory Accesses Using a Load Wait Buffer

نویسندگان

Shelley Chen

Jennifer Morris

چکیده

Many dynamic scheduling techniques take advantage of out-of-order instruction execution to hide memory access latency. However, as the disparity between processor and memory speeds increases, delays in the load-store queue become more of a bottleneck. One way to mitigate these delays is to allow loads and stores to execute and retire from the load-store queue (LSQ) out-oforder. Unfortunately, when the LSQ fills with pending loads, other loads and stores are prevented from entering the buffer to be retired. In addition to out-of-order execution of loads and stores, we propose temporary removal of long-latency, pending loads to a separate load wait buffer (LWB), similar to the waiting instruction buffer (WIB) proposed by Lebeck, et. al. [1]. Simulation results show successive increases in benchmark IPC with out-of-order loads, out-of-order loads and stores, and outof-order loads and stores with a LWB. The design with the LWB shows up to 303% speedup in IPC.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Autonomous Instruction Memory Equipped with Dynamic Branch Handling Capability

portable information appliances, the extraordinary power consumption ratio of memory accesses promotes importance of efficient memory system design to an ultimate. We address the following issues: how to minimize memory bandwidth requirement for instruction accesses, and how to minimize memory access delay, again for instruction accesses. Then we propose to move dynamic branch handler (e.g., br...

متن کامل

Accurate analysis of memory latencies for WCET estimation

These last years, many researchers have proposed solutions to estimate the Worst-Case Execution Time of a critical application when it is run on modern hardware. Several schemes commonly implemented to improve performance have been considered so far in the context of static WCET analysis: pipelines, instruction caches, dynamic branch predictors, execution cores supporting out-of-order execution...

متن کامل

Data Prefetching by Exploiting Global and Local Access Patterns

This paper proposes a new hardware prefetcher that extends the idea of the Global History Buffer (GHB) originally proposed in [1]. We augment the GHB with several Local History Buffers (LHBs), which keep the memory access information for selective program counters. These buffers can then be queried on cache accesses to predict future memory accesses and enable data prefetching using novel detec...

متن کامل

Optimized On-Chip-Pipelining for Memory-Intensive Computations on Multi-Core Processors with Explicit Memory Hierarchy

Limited bandwidth to off-chip main memory tends to be a performance bottleneck in chip multiprocessors, and this will become even more problematic with an increasing number of cores. Especially for streaming computations where the ratio between computational work and memory transfer is low, transforming the program into more memory-efficient code is an important program optimization. On-chip pi...

متن کامل

Failure-Oblivious Computing and Boundless Memory Blocks

Memory errors are a common cause of incorrect software execution and security vulnerabilities. We have developed two new techniques that help software continue to execute successfully through memory errors: failure-oblivious computing and boundless memory blocks. The foundation of both techniques is a compiler that generates code that checks accesses via pointers to detect out of bounds accesse...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

Out-of-Order Memory Accesses Using a Load Wait Buffer

نویسندگان

چکیده

منابع مشابه

Autonomous Instruction Memory Equipped with Dynamic Branch Handling Capability

Accurate analysis of memory latencies for WCET estimation

Data Prefetching by Exploiting Global and Local Access Patterns

Optimized On-Chip-Pipelining for Memory-Intensive Computations on Multi-Core Processors with Explicit Memory Hierarchy

Failure-Oblivious Computing and Boundless Memory Blocks

عنوان ژورنال:

اشتراک گذاری